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Abstract A^V'7 r ^ :> 

We present a model for data reorganization in parallel disk systems that is geared towards load balam ing 
in an environment with periodic access patterns. Data reorganization is performed by disk cooling, i e., 
migrating files or extents from the hottest disks to the coldest ones. We develop an approximate queueing 
model for determining the effective arrival rates of cooling requests and discuss its use in assessing the costs 
versus benefits of cooling actions. 


Index Terms: 

database reorganization, load balancing, temporal access patterns, parallel disk systems, approximate queue- 
ing model. I/O service stealing 


1 Introduction 


Database reorganization plays an important role in the performance tuning of dynamic systems with evoking 
access patterns. In this environment it is highly desirable to invoke an on-line database reorganization 
scheme in which the reorganization actions are performed concurrently with regular transactions [2. 3]. 
Thus, in contrast to off-line reorganization, which is performed while disallowing regular transactions, on- 
line reorganization is usually performed incrementally as a lower priority transaction (<]. In a centralized 


database system reorganization is performed in order to reduce the access time [2, 3]. In contrast, the main 


objective of data reorganization in parallel database systems is load balancing. 

We present a new model for database reorganization in parallel database systems which allows the system 
to determine at a given point in time whether a reorganization action is cost beneficial or not. given that 


the reorganization itself imposes an additional load on the system. Reorganization is performed dynamically 
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by employing an incremental data migration procedure called disk cooling. Disk cooling migrates data tmm 
•■hot" fi.e.. heavily utilized) disks to "cold'' (i.e.. light.lv utilized) disks. 

Our model differs in a number of aspects from other studies proposed in literature for determining optimal 
reorganization points in centralized database svstems[6. 31. Although the cost of performing a reorganization 
was considered, reorganizations were viewed as occurring instantly, thus having no effect on the overall svsrem 
load. In addition, these assumed that the eost of performing a regular transaction always increases m tune 
if the database is not in a reorganized state. 

Our reorganization model is geared towards a workload in which a substantial proportion of the trans- 
actions exhibit a periodic pattern of access characteristics. In such a case, it may he beneficial to postpone 
a reorganization for a later point in time when there are fewer regular transactions. In our system a data 
migration request consists of two phases, a read phase, where the hottest disk is accessed, and a write phase, 
where the coolest disk is accessed. However, the read phase of a migration action, and hence the entite 
cooling action, is not executed if the service queue of the source disk is not empty. Since the source disk 
carries the heaviest share of the load, scheduling a reorganization action would most likely increase the 
loadimbalance if the queue at the source disk is not empty. The model proposed here is a generalization of 
the earlier models introduced in [4. 5]. The model used in [5] did not consider periodic access patterns. In 
contrast, in [4] we considered explicitly periodical access patterns, but all reorganization actions were treated 
as lower priority requests. 

We shall refer to the read requests of the reorganization actions under a no-enqueueing policy as I/O 
service stealing requests, given their analogy to cycle stealing operations in the CPU execution. As part of 
our reorganization scheme, we have developed an approximate queueing model for a system with two types 
of requests, namely regular requests and I/O service stealing requests. Lsing this model, we can determine 
the intervals in time when the read and write phases of the reorganization actions will be scheduled. gi\en 
the moment in time that the load balance is observed. 


2 Data Redistribution in Parallel Disk Systems 

We have implemented an intelligent file manager, called FIVE, for parallel disk systems that can perform 
striping on a file-specific or global basis as desired by the application, and in addition achieves load balancing 
by judicious file allocation and redistribution of data [4, ol. In order to perform load balancing, our file system 
keeps track of the following related statistics [5]: 

• the heat of files (or extents, i.e., smallest units of data migration) and disks, where the heat is de- 
termined as the number of block accesses to a file or disk per time unit, as measured by statistical 
observation over a moving window of a certain length. 

• the temperature of files (extents), which is defined as the ratio between heat and size. 
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An extent is a file fragment which consists of all the striping units of a file that are allocated to the >ame 
disk in a single allocation unit [7j. Note that the heat metric captures the number of block accesses due ... 
regular requests, and thus we obtain the following relationship between the hear of a disk r. H . and the 
mean arrival rate of regular requests. A r : 


A,. = * 
R 


1 ) 


where R is the average request size measured in terms of the number of blocks accessed. 

The above formula assumes that the access patterns of hies, hence disks, are fixed ... time. In practice, 
we encounter many environments which exhibit periodical, predictable access patterns. Ill our model for 
database reorganization these periodic patterns can be incorporated bv identifying a number of intervals 
such that the heat of a hie stays constant accross an interval, but is allowed to vary across them. As m A : . 
we define rhe weighted heat of file k as: 


wfh; = £ x'fj-t,- 1) i2) 

where 

n is the numbers of intervals 

tj - tj-i is the length of interval j 
H: is the heat of file k in interval j 

k .j 

Correspondingly, the weighted heat of disk i is defined as: 

WDH, = ]T H,.j x [t } - C-i) l3) 

J= l 

where H t ] is the heat of disk t in interval j. is computed as the accumulated heat in interval j of all 

files that reside on disk i. Note that arrival rate A r is also a function of the interval in time: we assume that 
~R, the average request size, is constant across all intervals, but as the heat of the disk changes, we obtain 
now A r m = ■ for m = I, .... n. 

n 

2.1 Temporal Disk Cooling Algorithm 

In order to perform dynamic heat redistribution we employ in our system a dynamic load balancing proi edme 
called disk cooling. Basically, disk cooling is a greedy procedure which tries to determine the best candidate, 
i.e.. hie (or extent) to remove from the most utilized disk. i.e.. the disk with the highest weighted heat, in oidei 
to minimize the amount of data being moved while obtaining the maximum gain. The (weighted) temper atuie 
metric is used as the criterion for selecting the files (extents) to be reallocated because temperature reflects the 
benefit/cost ratio of the reallocation. The file to be moved is reallocated to the disk with the lowest weighted 
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Figure 1: Impact of cooling on 'hot" disk 


heat. In the case of an extent, in order to facilitate intra-request parallelism, an additional constraint is 
observed, namelv that the target disk should not already hold an extent of the corresponding file. 

In our system the disk cooling procedure is implemented as a background daemon which is invoked at 
fixed intervals of time. The procedure checks first if a given trigger condition is satisfied or not [5]. If the 
trigger condition is false, the system is considered load balanced and no cooling action is performed. 

A cooling action will be executed only if our estimate of its benefit exceeds its additional cost, with both 
measures taking into account this temporal access pattern. In order to estimate the cost/benefit of a cooling 
action we make use of the weighted disk heat variance (WDHY) as an explicit objective function (4j. W DH\ 
is defined as follows: 

n D 

WDHV(H) = 52 x {t J “ (4) 

j=i i=i 

where 

D is the number of disks in the parallel system 

Hj is the mean disk heat (over all disks) in interval j 

The benefit of the cooling action is measured by examining the load balance of the system before and 
after the potential reorganization. This benefit, denoted by B, is computed as the difference W DH\, urr - 
W DHVfuture, where WDHV curr is the weighted disk heat variance before the potential cooling process 
and \V DHVf u ture is the weighted disk heat variance that potentially would result if the extent were to be 
moved to the target disk. In order for the cooling to be scheduled, its benefit B must exceed the extra 
cost, denoted bv E . introduced by the reorganization process itself. The cooling process is executed in two 
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stops, rhe tirsr corresponding tot he mid phase of the action. when- the hot disk i> accessed. and the >rnh 
phase of the action, during which the cold target disk is accessed. I he read (and write: phases mt iodine an 
additional amount of heat on the source and target disks which can he computed by dividing rhe size ot tin 
file ( extent ^ to be moved by the corresponding duration of rhe phase. The read phases correspond to I/O 
stealing requests and. as discussed in Section 3. the response time ofan I/O stealing request is equal to its 
service time, denoted by 1/p,. Thus the duration of a read phase is estimated as l./p,. Figure 1 illustrates 
rhe temporal heat changes on the source disk with and without cooling. The permanent heat reduction 
due to rhe read phase is already accounted for in the benefit B: on the other hand, to determine the extra 
cost l temporary heat increase in Figure 1) we also need to determine the interval in time when rhe cooling 
started. 


2.2 Estimating the intervals of the cooling action 

Assuming that rhe cooling daemon is invoked at time now ;m iterative procedure is invoked in order to 
determine the intervals in time, denoted bv m and ». when the read and write phases will be acrualh 
scheduled and executed at the corresponding service queues. Let us assume that the cooling daemon is 
invoked during time interval j. i.e. t, - 1 < now < t } . Using rhe mean arrival rate of regular requests 
during interval j. A rj . and the arrival rate of the disk cooling requests. A,, the approximate queueing model 
developed in Section 3 is used first to determine A,//, the effective arrival rate of the read actions of the 
corresponding cooling requests. We assume, for simplicity, that the trigger condition is always satisfied, i.e . 
some heat imbalance is always present. Thus, the mean arrival rate of disk cooling requests in our system, 
which correspond to the service stealing requests in the queueing model, is fixed and can be calculated as: 


" ' time between successive daemon invocations 

The interval m where the read part of the the cooling request would be scheduled is determined as by 
the following iterative procedure. Notice that this procedure may require to recompute the value of A,//. 


compute A „// using equation (9): 
while (interval = NOT. FOUND) do 

• Case 1: (t } -\ < (now 4- l/Kff) < *j) : 

rn := j : interval F OLA D ; 

• Case 2: ((tj < ( now + 1/A e //) <tj + 1 ) and (A r .j-\ < A rj )): 

rn := j -f 1; interval := FOUND: 


• Else: 
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Reiterate procedure to compute A .// using Ar. ; ~i and A,. 

Recompute mm’ := now 4- (/ * A,), where i = min{/r umr - I A* * A<) > t } }: 

endwhile 


The computation 
substantially simpler. 


of the interval n. where the write phase of the cooling request would be scheduled, is 
Since the target disk is cool we can schedule the corresponding reorganization request 


as soon aspossible. Thus, if the reading phase was executed in 


interval m. the write phase will be scheduled 


in the same interval rn or in interval rn + 1 [9]. 

Having determined the intervals rn and n. we can compute E. the extra cost due to cooling, as follows. 

We add two •dummy" intervals to the load balancing cycle to account for the read and write phases of the 

cooling action. During such a dummy interval the heat of each disk, except for the disk which is the subject 

of the read or write, correspondingly, is taken to be the same as the heat of the disk during the time interval 

when the corresponding read or write phase started. Thus, the terms in E can be computed as follows: 

Q D n t 

E - m ) 2 * read-duration 4 - £(// ri - * write -duration 

1=1 ™ *‘ m 1=1 

where 


/ f Hi.m + extent-size / read-duration, if i = s 

Hi.m — | Hi. m i otherwise 

n f Hi >rl 4* extent size / write -duration, if i — t 

n — | Hi, n , otherwise 

More details of our temporal disk cooling procedure are given in [9]. 


3 An Approximate Queueing Model for I/O Service Stealing 

I/O service stealing requests are issued periodically by the reorganization process whenever a load imbalance 
is observed, and they correspond to the read phases of the cooling actions. In this section we present an 
approximate queueing model for deriving the overall utilization and effective arrival rate. \ /• of I/O service 
stealing requests in a two class system consisting of regular and reorganization requests. The behavior of 
the two classes of requests is characterized as follows: 

1. regular requests-, these requests have a mean arrival rate A r . The interarrival time of these requests is 
assumed to be exponentially distributed. The mean service rate of these requests is given by /x r . 

2. I/O service stealing requests: these lower priority requests are issued periodically by an incremental 
reorganization process. We assume a constant interrarrival time 1/A S and a mean service rate p s for 
these requests. 

For I/O service stealing requests two additional restrictions apply: 
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If an I/O service stealing request arrives and the service queue is nut 
by the scheduler of the queue. 


empty, the request is disregarded 


2. I/O service stealing requests are synchronous, i.e.. a new I/O service stealing request is not enqueued 
until the execution of the previous one is finished. Thus, at any point in time there is at most one I/O 
service stealing request in the system. 

From the discussion above it is clear that the response time of an I/O service stealing request is equal to 
its service time. i.e.. l//r s . We proceed now to derive the formulae for the effective arrival rate of I/O service 
stealing requests. A,//, as seen by the service center, and the overall svstem utilization. p. 

The probability that an I/O service stealing request finds the service queue empty is given by 1 - />■ 
Thus, we obtain: 

K/f = O - l>) x A,.. l6) 

This is in effect a recursive formula since p depends on A,//. In order to eliminate the inherent recursion 
in formula (6). we adopt an approximation now and neat the svstem as a regular M/G/l queue with two 
classes of prioritized requests [1] : regular requests have high priority, while the service stealing requests have 
low priority. Thus, we assume that the interarrival times for both regular requests and I/O service stealing 
requests are exponentially distributed. Note that in our actual system implementation the stealing requests 
have constant interrarrival times (see equation 5). 

The utilization p l due to requests with priority i. in an M/G/I queue with t priority classes is given by: 


P, = -V//t,. 


( 7 ) 


Furthermore, under the exponential interarrival times assumption, the overall utilization p of the system can 
be expressed as the sum: 

A r A,.// o) 

p = Pr + Ps = — H (8 ' 

fl r /Cs 

Note that p depends only on the mean arrival and service rates of the two classes of requests, and is 
independent of the service time distributions. From equation (8) we obtain: 


Klf = 



x p„ . 


(9) 


Finally, substitution of equation (9) into equation (6) yields: 

a s -t- y x /i. 


( 10 ) 


In [9] we report on an experimental validation of this model and show that the maximum error of A. // 
ranges from 1% to 5% depending upon the arrival rates A,, and A, of regular and I/O stealing requests. 
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